NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal Batch-Dynamic kd-trees for Processing-in-Memory with Applications

https://doi.org/10.1145/3694906.3743318

Zhao, Yiwei; Kang, Hongbo; Gu, Yan; Blelloch, Guy E; Dhulipala, Laxman; McGuffey, Charles; Gibbons, Phillip B (July 2025, ACM)

Free, publicly-accessible full text available July 16, 2026
PIM-trie: A Skew-resistant Trie for Processing-in-Memory

https://doi.org/10.1145/3558481.3591070

Kang, Hongbo; Zhao, Yiwei; Blelloch, Guy E.; Dhulipala, Laxman; Gu, Yan; McGuffey, Charles; Gibbons, Phillip B. (June 2023, ACM)

Memory latency and bandwidth are significant bottlenecks in designing in-memory indexes. Processing-in-memory (PIM), an emerging hardware design approach, alleviates this problem by embedding processors in memory modules, enabling low-latency memory access whose aggregated bandwidth scales linearly with the number of PIM modules. Despite recent work in balanced comparison-based indexes on PIM systems, building efficient tries for PIMs remains an open challenge due to tries' inherently unbalanced shape. This paper presents the PIM-trie, the first batch-parallel radix-based index for PIM systems that provides load balance and low communication under adversary-controlled workloads. We introduce trie matching-matching a query trie of a batch against the compressed data trie-as a key building block for PIM-friendly index operations. Our algorithm combines (i) hash-based comparisons for coarse-grained work distribution/elimination and (ii) bit-by-bit comparisons for fine-grained matching. Combined with other techniques (meta-block decomposition, selective recursive replication, differentiated verification), PIM-trie supports LongestCommonPrefix, Insert, and Delete in O(logP) communication rounds per batch and O(l/w) communication volume per string, where P is the number of PIM modules, l is the string length in bits, and w is the machine word size. Moreover, work and communication are load-balanced among modules whp, even under worst-case skew.
more » « less
Full Text Available
PIM-Tree: A Skew-Resistant Index for Processing-in-Memory

https://doi.org/10.14778/3574245.3574275

Kang, Hongbo; Zhao, Yiwei; Blelloch, Guy E.; Dhulipala, Laxman; Gu, Yan; McGuffey, Charles; Gibbons, Phillip B. (December 2022, Proceedings of the VLDB Endowment)

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose aggregate memory bandwidth scales with the number of PIM nodes. There is an inherent tension, however, between minimizing inter-node communication and achieving load balance in PIM systems, in the presence of workload skew. This paper presents PIM-tree , an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in data and queries. Our skew-resistant index is based on a novel division of labor between the host CPU and PIM nodes, which leverages the strengths of each. We introduce push-pull search , which dynamically decides whether to push queries to a PIM-tree node or pull the node's keys back to the CPU based on workload skew. Combined with other PIM-friendly optimizations ( shadow subtrees and chunked skip lists ), our PIM-tree provides high-throughput, (guaranteed) low communication, and (guaranteed) high load balance, for batches of point queries, updates, and range scans. We implement PIM-tree, in addition to prior proposed PIM indexes, on the latest PIM system from UPMEM, with 32 CPU cores and 2048 PIM nodes. On workloads with 500 million keys and batches of 1 million queries, the throughput using PIM-trees is up to 69.7X and 59.1x higher than the two best prior PIM-based methods. As far as we know these are the first implementations of an ordered index on a real PIM system.
more » « less
Full Text Available
The Processing-in-Memory Model

https://doi.org/10.1145/3409964.3461816

Kang, Hongbo; Gibbons, Phillip B.; Blelloch, Guy E.; Dhulipala, Laxman; Gu, Yan; McGuffey, Charles (July 2021, SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures)

Full Text Available
Sage: parallel semi-asymmetric graph algorithms for NVRAMs

https://doi.org/10.14778/3397230.3397251

Dhulipala, Laxman; McGuffey, Charles; Kang, Hongbo; Gu, Yan; Blelloch, Guy E.; Gibbons, Phillip B.; Shun, Julian (May 2020, Proceedings of the VLDB Endowment)
null (Ed.)
Full Text Available

Search for: All records